We'll use some operators from Chapter 3. If you want to write images out or to see them as pictures, you'll need to normalise them:

Chapter 4 Low-Level Feature Extraction and Edge Detection: CHAPTER4.MCD

Feature Extraction & Image Processing,

M. S. Nixon and A. S. Aguado

Elsevier/ Butterworth Heinmann, 2nd Edition 2007

This worksheet is the companion to Chapter 4 and implements the techniques for low-level feature extraction and edge detection described therein. The worksheet follows the text directly and allows you to process synthetic images and the eye image.

In this Chapter we'll first look at edge detection which is the way we find the boundaries (or edges) of features (or objects). This is similar to human vision where the change in clour (or intensity) reveals the edge of an object. We'll use the image of a square as our starting point.

and sometimes we'll want to threshold them

and we'll often need to clear them

The basic edge detection operator is the modulus of the difference between adjacent pixels, to detect any change in intensity between them. We can't process the picture border so we'll end up with a (slightly) smaller picture.

The y operator differences horizontally adjacent pixels to emphasise vertical edges

and we'll need the averaging operator

So let's find them:

The operator differences vertically adjacent pixels to emphasise horizontal edges

So let's find them:

The results of applying these operators are images which show up vertical (for edge_x) and horizontal (for edge_y) differences in intensity

Output the picture

Adding the horizontal and vertical edges gives all the edges

Here we get the vertical edges, but lose the bottom row of the image

We now see all the edges in the picture, horizontal and vertical, but notice the missing top left hand corner.

Here we get the horizontal edges, but lose the right hand column of the image

The Roberts operator was an early edge detection operator, based on this idea. It requires two templates:

The Roberts operator evaluates the edge magnitude as the maximum of M_plus and M_minus

So our operator is:

NW - SE template

NE - SW template

Notice that top left hand corner edge now appears, due to the diagonal nature of the Roberts operator

You don't need to store the two (intermediate) templates in application code, just the final result will do. But the Roberts operator has no inbuilt noise smoothing. Also, it's more usual to combine horizontal and vertical templates using a vector format:

Here, the edge magnitude is the length of the vector (which still doesn't show the top left corner). The edge direction is the direction of the vector; we'll come to it later.

The 3*3 Prewitt operator forms the difference between the sum of 3 pixels in the previous row (column) from the sum of the 3 pixels in the following row (column) to highlight horizontal (vertical) edges. This includes smoothing within the edge detection template. The difference between columns 0 and 2 gives the edge data along the x axis as:

Difference between columns 0 and 2

We now have the horizontal and vertical differences. Since these are orthogonal, we do not need to add the templates together or determine their maximum (as in the Roberts operator), but we combine them vectorially to give the edge magnitude and direction by

Pythagoras

Vertical edge, y positive

Vertical edge, y negative

Edge direction is in degrees

So the whole operator is:

Magnitude

Direction

Displaying these as pictures gives:

An alternative form is to supply the edge data as a vector, rather than in magnitude and direction format.

Note that the edges are now thicker, in the magnitude image, than they were for the basic first order operator. We also have edge direction which gives a measure of the direction of the edge. We can display this using the vector plot facility.

The 3*3 Sobel operator doubles up the central coefficients in the Prewitt operator

To get the general form of the Sobel operator, we perform optimal smoothing in one direction and optimal differencing in the other. We first define the size of the differencing operator, and a pointer within the window

Optimal smoothing is a template whose coefficients are given by Pascal's triangle.

It appears that Mathcad14 doesn't like this function and has highlighted it in pink.

From these coefficients, we obtain a smoothing template

To get optimal differencing, we subtract two of the smoothing templates from the previous expansion in Pascal's triangle

The optimal differencing template is then

It all works, pink or not...

The Sobel template is then

The general form of the Sobel template operator combines the smoothing and differencing operations on image data, and when applied to the image, is:

And the x template is given by:

The sum of the modulus of the template coefficients can be used to normalise the Sobel outputs

Which gives a value of:

We then divide the template results by this value, (but it can make the edge data too small for window sizes of 5 or greater).

So the edge direction and magnitude are:

Compare this with the result of the Prewitt operator. You'll see little difference. The improvement in performance is actually associated with optimal smoothing, and this is much better shown in larger (real) images. We'll use Mathcad's picture facility again: remember that if you try different sized images, you'll need to resize the displayed picture.

So let's try it on an eye image:

then apply the Sobel operator:

and then threshold the magnitude image to find edge points which exceed a value of 35

:

eye image

edge magnitude

thresholded edge magnitude

It is actually possible to arrange for the gradient direction to point tangentially forwards along or backwards down and edge, or even normal to the edge, by inverting and/or transposing the Sobel templates. This gives four possible arrangements of the edge direction, which for the image of a square are:

We can actually arrange the edge data to be normal to the edge, simply by swapping the edge templates:

Also, let's look at the Fourier transform of a 5*5 Sobel template. The template is:

To display the transform we'll need the rearrange operator:

and we'll need some frequency co-ordinates:

We'll need to rearrange it:

And then take the transform as:

The transform is then as a surface plot (transposed to be the same way round as the picture.):

and as a picture:

The Sobel operator was one of the most sophisticated edge detection approaches until it was supplanted by the Canny operator. Canny optimised the formulation of edge detection. Canny's optimal approach can be achieved by four processing stages:

i) Gaussian smoothing;

ii) Sobel edge detection;

iii) non-maximum suppression; and

iv) hysteresis thresholding.

For reasons of processing speed, we'll redefine the Gaussian and Sobel operators (the generalised Sobel implementation is rather slow (as you probably noticed!). First, we'll use the Gaussian smoothing operator which is implemented using template convolution by

From Chapter 3, our Gaussian template is generated by

and the template convolution operator

Remember that Mathcad will give a faster response by Fourier calculation for large template sizes (you'll need the conv operator and to make up a picture from the Gaussian template. For a Gaussian operator defined by:

This can be done using template pointers for say :

then

and then call the conv operator (note that you'll have to use the magnitude of the result).

The Sobel implementation we'll use avoids the (slow) call for a submatrix function, by direct implementation of a 3*3 processing template, and returns the vector and the magnitude information (we need the vector information in non-maximum suppression) as:

We now need the non-maximum suppression operator. This gives differentiation perpendicular to the edge as:

We'll need to check image co-ordinates are within the current frame so a logical operator is

Then, for hysteresis thresholding we'll need to analyse neighbourhood connectivity (a recursive loop finding if neighbours are connected to a seed point) as:

Is the point above the lower threshold, unlabelled and in the picture?

It is, so label it as white

and use it as a seed point for further analysis

after analysis, return the new image

The hysteresis thresholding operator first finds a point which exceeds an upper switching threshold (a seed point), and then finds if any of that point's neighbours exceed the lower switching threshold. If they do, they in turn become seed points for further analysis (in the connect operator). So the hyst_thr operator just starts it all off. Its arguments are the (unlabelled) edge image and the upper and lower switching thresholds. It returns an image with (labelled) detected edge points.

is the point above the upper threshold and not already labelled?

so label it as white

and analyse the neighbours

finally, pass back the completed image

So let's see it go. First, we'll choose the variance of the Gaussian operator as:

Now apply a 3*3 Gaussian operator to the image:

Now we compute the edge information, step up Mr. Sobel....

and then find the ridges of edge data by non-maximum suppression

and perform hysteresis thresholding with an upper threshold of 90 values, and a lower one of 40

Let's see what we've got...

So all that processing was worth it: thin edge lines, in the right place.

Let's see what the non-maximum suppression operator chose, in comparison with the original (uniformly thresholded) Sobel edges:

Uniform thresholding: lower threshold

Uniform thresholding: upper threshold

Uniform thresholding applied to Sobel magnitude

We can now see that uniform thresholding of the conventional Sobel provides too many points. Try selecting a larger value for the threshold (220 say), you'll get less points but the final image will never be as clear as uniform thresholding applied to the non-maximum suppressed image. The function of hysteresis thresholding is to retain all the points above the upper threshold, and some of the (connected ones) above the lower switching threshold.

This is first order detection. We can see this process in one dimensional form by defining a function f as

We can plot it as a function and see that it approximates a (smoothed) step function, or edge.

We then differentiate it to get

We can plot its differentiated form and find that the peak is where the step crosses the horizontal axis: the peak occurs at the position of the edge, as in edge detection. Let's differentiate it again

When we plot this, in its twice differentiated form, we see that the zero crossing occurs where the peak of the first order differention occurred which is exactly where the original function crosses the x axis.

The edge operators we've studied so far are actually first order operators since they approximate the first order derivative. The Laplacian edge detector is a second order operator. Its a 3*3 template with the coefficients


0 -1 0

-1 4 -1 These coefficients derive from the difference between two, adjacent, first order

0 -1 0 differences, giving -1 2 -1 which is then applied vertically and horizontally.


As such, the location of the edge is given by the zero-crossings, rather than the peaks, of the edge detected image.

So a 3*3 Laplacian operator is:

In Marr-Hildreth, we'll need a mask normalisation function:

And you can see how the actual edge points lie between the zero-crossings.

The Marr_Hildreth operator is the most popular second order operator. To implement the Marr-Hildreth operator, we need a template mask which contains the Laplacian of Gaussian coefficients. We'll the convolve this with the image. The mask is calculated according to:

Let's have a look at some masks:

So we'll apply the second one to the eye image (pop off for a cup of tea if you're on a 486 or less...):

Remember that it's the zero-crossings which mark the edges, so the edges are unclear from this image.

We now need an operator to detect the zero crossings:

so let's have a look:

Now they are nice and clear! Note that edge regions are closed, whereas they are fragmentary in the Canny operator. This is one advantage of the Marr-Hildreth operator.

Let's see it in comparison with the Laplacian edges

hmm.. this needs a different zero-

crossing detector, or a bit of smoothing.

For clarity, let's draw it all together and apply it to the old eye image

Yup, all the low-level structure is there

Let's look at the Fourier transform of the earlierLoG operator of size 9x9:

We'll need to rearrange it:

And then take the transform as:

The transform is then

and as a picture:

This is circular-symmetric, as expected. Since the transform reveals that the LoG operator omits low and high frequencies (those close to the origin, and those far away from the origin, it is equivalent to a band-pass filter. Choice of the value of s controls the spread of the operator in the spatial domain and the 'width' of the band in the frequency domain: setting s to a high value gives low-pass filtering. What happens for large values of s?

Curvature is another low-level feature. Essentially, curvature is the rate of change of the tangent information. Accordingly, we can build its calculation into the hysteresis thresholding algorithm. When analysing connected pixels, we can from the difference in edge direction which is an estimate of local curvature. Here, we will store the difference in edge direction information as a complex number: the real part of the resulting edge image are the thresholded edge points and the imaginary part is the estimate of curvature at that point. First we will redefine the connectivity analysis operator for curvature estimation as:

startdir is the edge direction at the current seed point

if the neighbour is o.k., label the point and work out the curvature

diffdir is the local estimate of curvature

stored as a complex number

We also need to redefine the initial calling routine (to handle complex numbers) as:

Eventually we'll need to threshold the complex data, so our corn_thresh operator (sorry!) is:

Let's read in an image of the suits in playing cards:

and we'll use the Sobel operator on the Gaussian blurred image (as in the Canny operator) as:

and form the non-maximum suppressed image as:

measure the curvature via hysteresis thresholding as:

and extract the curvature as the imaginary part of the edge image

which gives the curvature as:

We can threshold this to find corners as:

We can also estimate curvature by differentiating the edge direction normal (perpendicular) to and parallel to the edge. This can be performed forwards or backwards, giving four different estimates of the curvature. The following code implements this differentiation process; corners can be located as maxima in the curvature information.

and derive the estimates of curvature:

So, let's have a look

Kq

Kpq

These suggest that there is not much difference between the positive and negative directions. However, differentiation perpendicular to the edge gives better results in this case. Try it with the eye image, and see what you get. Also, try thresholding the data and compare it with the result of edge detection.

We'll now move on to the last low-level feature of interest. This is called optical flow. It measures the motion between two images. So we'll start with a couple of images in which a ball is moving. The ball image is given by

We'll need an empty picture

So we can get our pictures of the ball as

Original circle

Moved circle

Difference between images

Note that the difference between images just tells us that something's change. It doesn't tell us how it has changed. That's optical flow.

To get the flow, we need the horizontal difference which for two consecutive images is defined as

and the vertical difference

and the time difference

Averaging function, if you need it

So our resulting difference images are

We need a weighted averaging function which exludes the central point

Lets's choose a value for the regularisation factor

Now we'll compute the flow, iteratively, as

Let's start things off with some null matrices

We'll use a regularisation of 0.3, for 8 iterations as

and we'll look at the bottom left hand corner


That looks right. If you want to look at the largest elements if flow, you can threshold it. Note that only the largest vectors point in exactly the right direction. This is because some of the assumptions made for optic flow have been violated in the other regions.

Let's now do it for some real images. We'll read in two successive images of a person walking (these have been downsampled for speed, but it's actually Ping Huang, one of the first people to work on automatic gait recognition)

Now we'll generate the differentials as

Again, we need to start with some null matrices

and we'll go for 16 iterations with a much higher regularisation factor (suggesting that the brightness constraint is more important here)

We'll actually just look at the magnitude, as the arrows get rather densely packed for larger images

So here it is: not brilliant! We use a different technique for flow of moving people, one based on correlation (matching). The motion is actually too large for the differential technique, so the differential information is not sufficiently precise for the flow estimates to converge correctly.

So this completes our survey of basic low-level feature extraction techniques. You can now find the edges of objects by first- and second-order techniques, with a wide variety of choice in each. Which is best? Some are certainly more popular than others but choice is necessarily a compromise between speed and accuracy, together with experimentation to determine which suits a particular application (which filtering in particular). Note that the Petrou and Spacek operators are not included here but a worth investigation in some applications where their better properties can be required. As before, experiment with different window size, different filters and different images to understand the properties of these operators.


We have also got two other low-level descriptions: curvature and flow (for moving objects). Other techniques to compute these rely on finding things in the image. We don't have phase congruency, SIFT or saliency here: there's code available from the originators for that. That's what we'll move on to in the next Chapter: Feature Extraction (finding shapes).